Overview

Dataset statistics

Number of variables25
Number of observations30000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.7 MiB
Average record size in memory200.0 B

Variable types

Numeric22
Categorical3

Warnings

BILL_AMT1 is highly correlated with BILL_AMT2High correlation
BILL_AMT2 is highly correlated with BILL_AMT1 and 1 other fieldsHigh correlation
BILL_AMT3 is highly correlated with BILL_AMT2 and 1 other fieldsHigh correlation
BILL_AMT4 is highly correlated with BILL_AMT3 and 2 other fieldsHigh correlation
BILL_AMT5 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
BILL_AMT6 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
PAY_AMT2 is highly skewed (γ1 = 30.45381745) Skewed
ID is uniformly distributed Uniform
ID has unique values Unique
PAY_0 has 14737 (49.1%) zeros Zeros
PAY_2 has 15730 (52.4%) zeros Zeros
PAY_3 has 15764 (52.5%) zeros Zeros
PAY_4 has 16455 (54.9%) zeros Zeros
PAY_5 has 16947 (56.5%) zeros Zeros
PAY_6 has 16286 (54.3%) zeros Zeros
BILL_AMT1 has 2008 (6.7%) zeros Zeros
BILL_AMT2 has 2506 (8.4%) zeros Zeros
BILL_AMT3 has 2870 (9.6%) zeros Zeros
BILL_AMT4 has 3195 (10.7%) zeros Zeros
BILL_AMT5 has 3506 (11.7%) zeros Zeros
BILL_AMT6 has 4020 (13.4%) zeros Zeros
PAY_AMT1 has 5249 (17.5%) zeros Zeros
PAY_AMT2 has 5396 (18.0%) zeros Zeros
PAY_AMT3 has 5968 (19.9%) zeros Zeros
PAY_AMT4 has 6408 (21.4%) zeros Zeros
PAY_AMT5 has 6703 (22.3%) zeros Zeros
PAY_AMT6 has 7173 (23.9%) zeros Zeros

Reproduction

Analysis started2021-04-29 05:53:08.436883
Analysis finished2021-04-29 05:55:15.561317
Duration2 minutes and 7.12 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

ID
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct30000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15000.5
Minimum1
Maximum30000
Zeros0
Zeros (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum1
5-th percentile1500.95
Q17500.75
median15000.5
Q322500.25
95-th percentile28500.05
Maximum30000
Range29999
Interquartile range (IQR)14999.5

Descriptive statistics

Standard deviation8660.398374
Coefficient of variation (CV)0.5773406469
Kurtosis-1.2
Mean15000.5
Median Absolute Deviation (MAD)7500
Skewness0
Sum450015000
Variance75002500
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
13221
 
< 0.1%
156291
 
< 0.1%
94861
 
< 0.1%
115351
 
< 0.1%
217921
 
< 0.1%
238411
 
< 0.1%
176981
 
< 0.1%
197471
 
< 0.1%
299881
 
< 0.1%
Other values (29990)29990
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
ValueCountFrequency (%)
300001
< 0.1%
299991
< 0.1%
299981
< 0.1%
299971
< 0.1%
299961
< 0.1%
299951
< 0.1%
299941
< 0.1%
299931
< 0.1%
299921
< 0.1%
299911
< 0.1%

LIMIT_BAL
Real number (ℝ≥0)

Distinct81
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167484.3227
Minimum10000
Maximum1000000
Zeros0
Zeros (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129747.6616
Coefficient of variation (CV)0.7746854124
Kurtosis0.5362628964
Mean167484.3227
Median Absolute Deviation (MAD)90000
Skewness0.9928669605
Sum5024529680
Variance1.683445568 × 1010
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500003365
 
11.2%
200001976
 
6.6%
300001610
 
5.4%
800001567
 
5.2%
2000001528
 
5.1%
1500001110
 
3.7%
1000001048
 
3.5%
180000995
 
3.3%
360000881
 
2.9%
60000825
 
2.8%
Other values (71)15095
50.3%
ValueCountFrequency (%)
10000493
 
1.6%
160002
 
< 0.1%
200001976
6.6%
300001610
5.4%
40000230
 
0.8%
500003365
11.2%
60000825
 
2.8%
70000731
 
2.4%
800001567
5.2%
90000651
 
2.2%
ValueCountFrequency (%)
10000001
 
< 0.1%
8000002
 
< 0.1%
7800002
 
< 0.1%
7600001
 
< 0.1%
7500004
< 0.1%
7400002
 
< 0.1%
7300002
 
< 0.1%
7200003
 
< 0.1%
7100006
< 0.1%
7000008
< 0.1%

SEX
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
2
18112 
1
11888 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters30000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row1
ValueCountFrequency (%)
218112
60.4%
111888
39.6%
Histogram of lengths of the category
ValueCountFrequency (%)
218112
60.4%
111888
39.6%

Most occurring characters

ValueCountFrequency (%)
218112
60.4%
111888
39.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number30000
100.0%

Most frequent character per category

ValueCountFrequency (%)
218112
60.4%
111888
39.6%

Most occurring scripts

ValueCountFrequency (%)
Common30000
100.0%

Most frequent character per script

ValueCountFrequency (%)
218112
60.4%
111888
39.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII30000
100.0%

Most frequent character per block

ValueCountFrequency (%)
218112
60.4%
111888
39.6%

EDUCATION
Real number (ℝ≥0)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.853133333
Minimum0
Maximum6
Zeros14
Zeros (%)< 0.1%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q32
95-th percentile3
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7903486597
Coefficient of variation (CV)0.426493143
Kurtosis2.078621603
Mean1.853133333
Median Absolute Deviation (MAD)1
Skewness0.9709720486
Sum55594
Variance0.6246510039
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
214030
46.8%
110585
35.3%
34917
 
16.4%
5280
 
0.9%
4123
 
0.4%
651
 
0.2%
014
 
< 0.1%
ValueCountFrequency (%)
014
 
< 0.1%
110585
35.3%
214030
46.8%
34917
 
16.4%
4123
 
0.4%
5280
 
0.9%
651
 
0.2%
ValueCountFrequency (%)
651
 
0.2%
5280
 
0.9%
4123
 
0.4%
34917
 
16.4%
214030
46.8%
110585
35.3%
014
 
< 0.1%

MARRIAGE
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
2
15964 
1
13659 
3
 
323
0
 
54

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters30000
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row1
5th row1
ValueCountFrequency (%)
215964
53.2%
113659
45.5%
3323
 
1.1%
054
 
0.2%
Histogram of lengths of the category
ValueCountFrequency (%)
215964
53.2%
113659
45.5%
3323
 
1.1%
054
 
0.2%

Most occurring characters

ValueCountFrequency (%)
215964
53.2%
113659
45.5%
3323
 
1.1%
054
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number30000
100.0%

Most frequent character per category

ValueCountFrequency (%)
215964
53.2%
113659
45.5%
3323
 
1.1%
054
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common30000
100.0%

Most frequent character per script

ValueCountFrequency (%)
215964
53.2%
113659
45.5%
3323
 
1.1%
054
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII30000
100.0%

Most frequent character per block

ValueCountFrequency (%)
215964
53.2%
113659
45.5%
3323
 
1.1%
054
 
0.2%

AGE
Real number (ℝ≥0)

Distinct56
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.4855
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.217904068
Coefficient of variation (CV)0.2597653709
Kurtosis0.04430337824
Mean35.4855
Median Absolute Deviation (MAD)6
Skewness0.7322458688
Sum1064565
Variance84.96975541
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
291605
 
5.3%
271477
 
4.9%
281409
 
4.7%
301395
 
4.7%
261256
 
4.2%
311217
 
4.1%
251186
 
4.0%
341162
 
3.9%
321158
 
3.9%
331146
 
3.8%
Other values (46)16989
56.6%
ValueCountFrequency (%)
2167
 
0.2%
22560
 
1.9%
23931
3.1%
241127
3.8%
251186
4.0%
261256
4.2%
271477
4.9%
281409
4.7%
291605
5.3%
301395
4.7%
ValueCountFrequency (%)
791
 
< 0.1%
753
 
< 0.1%
741
 
< 0.1%
734
 
< 0.1%
723
 
< 0.1%
713
 
< 0.1%
7010
< 0.1%
6915
0.1%
685
 
< 0.1%
6716
0.1%

PAY_0
Real number (ℝ)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.0167
Minimum-2
Maximum8
Zeros14737
Zeros (%)49.1%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.123801528
Coefficient of variation (CV)-67.29350467
Kurtosis2.720715042
Mean-0.0167
Median Absolute Deviation (MAD)1
Skewness0.7319749269
Sum-501
Variance1.262929874
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
014737
49.1%
-15686
 
19.0%
13688
 
12.3%
-22759
 
9.2%
22667
 
8.9%
3322
 
1.1%
476
 
0.3%
526
 
0.1%
819
 
0.1%
611
 
< 0.1%
ValueCountFrequency (%)
-22759
 
9.2%
-15686
 
19.0%
014737
49.1%
13688
 
12.3%
22667
 
8.9%
3322
 
1.1%
476
 
0.3%
526
 
0.1%
611
 
< 0.1%
79
 
< 0.1%
ValueCountFrequency (%)
819
 
0.1%
79
 
< 0.1%
611
 
< 0.1%
526
 
0.1%
476
 
0.3%
3322
 
1.1%
22667
 
8.9%
13688
 
12.3%
014737
49.1%
-15686
 
19.0%

PAY_2
Real number (ℝ)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1337666667
Minimum-2
Maximum8
Zeros15730
Zeros (%)52.4%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.197185973
Coefficient of variation (CV)-8.949807922
Kurtosis1.57041773
Mean-0.1337666667
Median Absolute Deviation (MAD)0
Skewness0.7905650222
Sum-4013
Variance1.433254254
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
015730
52.4%
-16050
 
20.2%
23927
 
13.1%
-23782
 
12.6%
3326
 
1.1%
499
 
0.3%
128
 
0.1%
525
 
0.1%
720
 
0.1%
612
 
< 0.1%
ValueCountFrequency (%)
-23782
 
12.6%
-16050
 
20.2%
015730
52.4%
128
 
0.1%
23927
 
13.1%
3326
 
1.1%
499
 
0.3%
525
 
0.1%
612
 
< 0.1%
720
 
0.1%
ValueCountFrequency (%)
81
 
< 0.1%
720
 
0.1%
612
 
< 0.1%
525
 
0.1%
499
 
0.3%
3326
 
1.1%
23927
 
13.1%
128
 
0.1%
015730
52.4%
-16050
 
20.2%

PAY_3
Real number (ℝ)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1662
Minimum-2
Maximum8
Zeros15764
Zeros (%)52.5%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.196867568
Coefficient of variation (CV)-7.201369245
Kurtosis2.084435875
Mean-0.1662
Median Absolute Deviation (MAD)0
Skewness0.8406818269
Sum-4986
Variance1.432491976
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
015764
52.5%
-15938
 
19.8%
-24085
 
13.6%
23819
 
12.7%
3240
 
0.8%
476
 
0.3%
727
 
0.1%
623
 
0.1%
521
 
0.1%
14
 
< 0.1%
ValueCountFrequency (%)
-24085
 
13.6%
-15938
 
19.8%
015764
52.5%
14
 
< 0.1%
23819
 
12.7%
3240
 
0.8%
476
 
0.3%
521
 
0.1%
623
 
0.1%
727
 
0.1%
ValueCountFrequency (%)
83
 
< 0.1%
727
 
0.1%
623
 
0.1%
521
 
0.1%
476
 
0.3%
3240
 
0.8%
23819
 
12.7%
14
 
< 0.1%
015764
52.5%
-15938
 
19.8%

PAY_4
Real number (ℝ)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2206666667
Minimum-2
Maximum8
Zeros16455
Zeros (%)54.9%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.169138622
Coefficient of variation (CV)-5.29821128
Kurtosis3.496983496
Mean-0.2206666667
Median Absolute Deviation (MAD)0
Skewness0.9996294133
Sum-6620
Variance1.366885118
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
016455
54.9%
-15687
 
19.0%
-24348
 
14.5%
23159
 
10.5%
3180
 
0.6%
469
 
0.2%
758
 
0.2%
535
 
0.1%
65
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
-24348
 
14.5%
-15687
 
19.0%
016455
54.9%
12
 
< 0.1%
23159
 
10.5%
3180
 
0.6%
469
 
0.2%
535
 
0.1%
65
 
< 0.1%
758
 
0.2%
ValueCountFrequency (%)
82
 
< 0.1%
758
 
0.2%
65
 
< 0.1%
535
 
0.1%
469
 
0.2%
3180
 
0.6%
23159
 
10.5%
12
 
< 0.1%
016455
54.9%
-15687
 
19.0%

PAY_5
Real number (ℝ)

ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2662
Minimum-2
Maximum8
Zeros16947
Zeros (%)56.5%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.133187406
Coefficient of variation (CV)-4.256902352
Kurtosis3.989748144
Mean-0.2662
Median Absolute Deviation (MAD)0
Skewness1.008197025
Sum-7986
Variance1.284113697
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
016947
56.5%
-15539
 
18.5%
-24546
 
15.2%
22626
 
8.8%
3178
 
0.6%
484
 
0.3%
758
 
0.2%
517
 
0.1%
64
 
< 0.1%
81
 
< 0.1%
ValueCountFrequency (%)
-24546
 
15.2%
-15539
 
18.5%
016947
56.5%
22626
 
8.8%
3178
 
0.6%
484
 
0.3%
517
 
0.1%
64
 
< 0.1%
758
 
0.2%
81
 
< 0.1%
ValueCountFrequency (%)
81
 
< 0.1%
758
 
0.2%
64
 
< 0.1%
517
 
0.1%
484
 
0.3%
3178
 
0.6%
22626
 
8.8%
016947
56.5%
-15539
 
18.5%
-24546
 
15.2%

PAY_6
Real number (ℝ)

ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2911
Minimum-2
Maximum8
Zeros16286
Zeros (%)54.3%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.149987626
Coefficient of variation (CV)-3.950489954
Kurtosis3.42653413
Mean-0.2911
Median Absolute Deviation (MAD)0
Skewness0.9480293916
Sum-8733
Variance1.322471539
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
016286
54.3%
-15740
 
19.1%
-24895
 
16.3%
22766
 
9.2%
3184
 
0.6%
449
 
0.2%
746
 
0.2%
619
 
0.1%
513
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
-24895
 
16.3%
-15740
 
19.1%
016286
54.3%
22766
 
9.2%
3184
 
0.6%
449
 
0.2%
513
 
< 0.1%
619
 
0.1%
746
 
0.2%
82
 
< 0.1%
ValueCountFrequency (%)
82
 
< 0.1%
746
 
0.2%
619
 
0.1%
513
 
< 0.1%
449
 
0.2%
3184
 
0.6%
22766
 
9.2%
016286
54.3%
-15740
 
19.1%
-24895
 
16.3%

BILL_AMT1
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct22723
Distinct (%)75.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51223.3309
Minimum-165580
Maximum964511
Zeros2008
Zeros (%)6.7%
Memory size234.5 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13558.75
median22381.5
Q367091
95-th percentile201203.05
Maximum964511
Range1130091
Interquartile range (IQR)63532.25

Descriptive statistics

Standard deviation73635.86058
Coefficient of variation (CV)1.437545339
Kurtosis9.806289341
Mean51223.3309
Median Absolute Deviation (MAD)21800.5
Skewness2.663861022
Sum1536699927
Variance5422239963
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02008
 
6.7%
390244
 
0.8%
78076
 
0.3%
32672
 
0.2%
31663
 
0.2%
250059
 
0.2%
39649
 
0.2%
240039
 
0.1%
41629
 
0.1%
50025
 
0.1%
Other values (22713)27336
91.1%
ValueCountFrequency (%)
-1655801
< 0.1%
-1549731
< 0.1%
-153081
< 0.1%
-143861
< 0.1%
-115451
< 0.1%
-106821
< 0.1%
-98021
< 0.1%
-90951
< 0.1%
-81871
< 0.1%
-74381
< 0.1%
ValueCountFrequency (%)
9645111
< 0.1%
7468141
< 0.1%
6530621
< 0.1%
6304581
< 0.1%
6266481
< 0.1%
6217491
< 0.1%
6138601
< 0.1%
6107231
< 0.1%
6085941
< 0.1%
6040191
< 0.1%

BILL_AMT2
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct22346
Distinct (%)74.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49179.07517
Minimum-69777
Maximum983931
Zeros2506
Zeros (%)8.4%
Memory size234.5 KiB

Quantile statistics

Minimum-69777
5-th percentile0
Q12984.75
median21200
Q364006.25
95-th percentile194792.2
Maximum983931
Range1053708
Interquartile range (IQR)61021.5

Descriptive statistics

Standard deviation71173.76878
Coefficient of variation (CV)1.447236829
Kurtosis10.30294592
Mean49179.07517
Median Absolute Deviation (MAD)20810
Skewness2.705220853
Sum1475372255
Variance5065705363
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02506
 
8.4%
390231
 
0.8%
32675
 
0.2%
78075
 
0.2%
31672
 
0.2%
250051
 
0.2%
39651
 
0.2%
240042
 
0.1%
-20029
 
0.1%
41628
 
0.1%
Other values (22336)26840
89.5%
ValueCountFrequency (%)
-697771
< 0.1%
-675261
< 0.1%
-333501
< 0.1%
-300001
< 0.1%
-262141
< 0.1%
-247041
< 0.1%
-247021
< 0.1%
-229601
< 0.1%
-186181
< 0.1%
-180881
< 0.1%
ValueCountFrequency (%)
9839311
< 0.1%
7439701
< 0.1%
6715631
< 0.1%
6467701
< 0.1%
6244751
< 0.1%
6059431
< 0.1%
5977931
< 0.1%
5868251
< 0.1%
5817751
< 0.1%
5776811
< 0.1%

BILL_AMT3
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct22026
Distinct (%)73.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47013.1548
Minimum-157264
Maximum1664089
Zeros2870
Zeros (%)9.6%
Memory size234.5 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12666.25
median20088.5
Q360164.75
95-th percentile187821.05
Maximum1664089
Range1821353
Interquartile range (IQR)57498.5

Descriptive statistics

Standard deviation69349.38743
Coefficient of variation (CV)1.475106015
Kurtosis19.78325514
Mean47013.1548
Median Absolute Deviation (MAD)19708.5
Skewness3.087830046
Sum1410394644
Variance4809337537
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02870
 
9.6%
390275
 
0.9%
78074
 
0.2%
32663
 
0.2%
31662
 
0.2%
39648
 
0.2%
250040
 
0.1%
240039
 
0.1%
41629
 
0.1%
20027
 
0.1%
Other values (22016)26473
88.2%
ValueCountFrequency (%)
-1572641
< 0.1%
-615061
< 0.1%
-461271
< 0.1%
-340411
< 0.1%
-254431
< 0.1%
-247021
< 0.1%
-203201
< 0.1%
-177061
< 0.1%
-159101
< 0.1%
-156411
< 0.1%
ValueCountFrequency (%)
16640891
< 0.1%
8550861
< 0.1%
6931311
< 0.1%
6896431
< 0.1%
6896271
< 0.1%
6320411
< 0.1%
5974151
< 0.1%
5789711
< 0.1%
5779571
< 0.1%
5770151
< 0.1%

BILL_AMT4
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct21548
Distinct (%)71.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43262.94897
Minimum-170000
Maximum891586
Zeros3195
Zeros (%)10.7%
Memory size234.5 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12326.75
median19052
Q354506
95-th percentile174333.35
Maximum891586
Range1061586
Interquartile range (IQR)52179.25

Descriptive statistics

Standard deviation64332.85613
Coefficient of variation (CV)1.487019671
Kurtosis11.30932483
Mean43262.94897
Median Absolute Deviation (MAD)18656
Skewness2.821965291
Sum1297888469
Variance4138716378
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03195
 
10.7%
390246
 
0.8%
780101
 
0.3%
31668
 
0.2%
32662
 
0.2%
39644
 
0.1%
15039
 
0.1%
240039
 
0.1%
250034
 
0.1%
100033
 
0.1%
Other values (21538)26139
87.1%
ValueCountFrequency (%)
-1700001
< 0.1%
-813341
< 0.1%
-651671
< 0.1%
-506161
< 0.1%
-466271
< 0.1%
-345031
< 0.1%
-274901
< 0.1%
-243031
< 0.1%
-221081
< 0.1%
-203201
< 0.1%
ValueCountFrequency (%)
8915861
< 0.1%
7068641
< 0.1%
6286991
< 0.1%
6168361
< 0.1%
5728051
< 0.1%
5690341
< 0.1%
5656691
< 0.1%
5635431
< 0.1%
5480201
< 0.1%
5426531
< 0.1%

BILL_AMT5
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct21010
Distinct (%)70.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40311.40097
Minimum-81334
Maximum927171
Zeros3506
Zeros (%)11.7%
Memory size234.5 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11763
median18104.5
Q350190.5
95-th percentile165794.3
Maximum927171
Range1008505
Interquartile range (IQR)48427.5

Descriptive statistics

Standard deviation60797.15577
Coefficient of variation (CV)1.508187617
Kurtosis12.30588129
Mean40311.40097
Median Absolute Deviation (MAD)17688.5
Skewness2.876379867
Sum1209342029
Variance3696294150
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03506
 
11.7%
390235
 
0.8%
78094
 
0.3%
31679
 
0.3%
32662
 
0.2%
15058
 
0.2%
39647
 
0.2%
240039
 
0.1%
250037
 
0.1%
41636
 
0.1%
Other values (21000)25807
86.0%
ValueCountFrequency (%)
-813341
< 0.1%
-613721
< 0.1%
-530071
< 0.1%
-466271
< 0.1%
-375941
< 0.1%
-361561
< 0.1%
-304811
< 0.1%
-283351
< 0.1%
-230031
< 0.1%
-207531
< 0.1%
ValueCountFrequency (%)
9271711
< 0.1%
8235401
< 0.1%
5870671
< 0.1%
5517021
< 0.1%
5478801
< 0.1%
5306721
< 0.1%
5243151
< 0.1%
5161391
< 0.1%
5141141
< 0.1%
5082131
< 0.1%

BILL_AMT6
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct20604
Distinct (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38871.7604
Minimum-339603
Maximum961664
Zeros4020
Zeros (%)13.4%
Memory size234.5 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11256
median17071
Q349198.25
95-th percentile161912
Maximum961664
Range1301267
Interquartile range (IQR)47942.25

Descriptive statistics

Standard deviation59554.10754
Coefficient of variation (CV)1.53206613
Kurtosis12.27070529
Mean38871.7604
Median Absolute Deviation (MAD)16755
Skewness2.846644576
Sum1166152812
Variance3546691724
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04020
 
13.4%
390207
 
0.7%
78086
 
0.3%
15078
 
0.3%
31677
 
0.3%
32656
 
0.2%
39645
 
0.1%
41636
 
0.1%
-1833
 
0.1%
240032
 
0.1%
Other values (20594)25330
84.4%
ValueCountFrequency (%)
-3396031
< 0.1%
-2090511
< 0.1%
-1509531
< 0.1%
-946251
< 0.1%
-738951
< 0.1%
-570601
< 0.1%
-514431
< 0.1%
-511831
< 0.1%
-466271
< 0.1%
-457341
< 0.1%
ValueCountFrequency (%)
9616641
< 0.1%
6999441
< 0.1%
5686381
< 0.1%
5277111
< 0.1%
5275661
< 0.1%
5149751
< 0.1%
5137981
< 0.1%
5119051
< 0.1%
5013701
< 0.1%
4991001
< 0.1%

PAY_AMT1
Real number (ℝ≥0)

ZEROS

Distinct7943
Distinct (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5663.5805
Minimum0
Maximum873552
Zeros5249
Zeros (%)17.5%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2100
Q35006
95-th percentile18428.2
Maximum873552
Range873552
Interquartile range (IQR)4006

Descriptive statistics

Standard deviation16563.28035
Coefficient of variation (CV)2.924524575
Kurtosis415.2547427
Mean5663.5805
Median Absolute Deviation (MAD)1932
Skewness14.66836433
Sum169907415
Variance274342256.1
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05249
 
17.5%
20001363
 
4.5%
3000891
 
3.0%
5000698
 
2.3%
1500507
 
1.7%
4000426
 
1.4%
10000401
 
1.3%
1000365
 
1.2%
2500298
 
1.0%
6000294
 
1.0%
Other values (7933)19508
65.0%
ValueCountFrequency (%)
05249
17.5%
19
 
< 0.1%
214
 
< 0.1%
315
 
0.1%
418
 
0.1%
512
 
< 0.1%
615
 
0.1%
79
 
< 0.1%
88
 
< 0.1%
97
 
< 0.1%
ValueCountFrequency (%)
8735521
< 0.1%
5050001
< 0.1%
4933581
< 0.1%
4239031
< 0.1%
4050161
< 0.1%
3681991
< 0.1%
3230141
< 0.1%
3048151
< 0.1%
3020001
< 0.1%
3000391
< 0.1%

PAY_AMT2
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct7899
Distinct (%)26.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5921.1635
Minimum0
Maximum1684259
Zeros5396
Zeros (%)18.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1833
median2009
Q35000
95-th percentile19004.35
Maximum1684259
Range1684259
Interquartile range (IQR)4167

Descriptive statistics

Standard deviation23040.8704
Coefficient of variation (CV)3.891274139
Kurtosis1641.631911
Mean5921.1635
Median Absolute Deviation (MAD)1991
Skewness30.45381745
Sum177634905
Variance530881708.9
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05396
 
18.0%
20001290
 
4.3%
3000857
 
2.9%
5000717
 
2.4%
1000594
 
2.0%
1500521
 
1.7%
4000410
 
1.4%
10000318
 
1.1%
6000283
 
0.9%
2500251
 
0.8%
Other values (7889)19363
64.5%
ValueCountFrequency (%)
05396
18.0%
115
 
0.1%
220
 
0.1%
318
 
0.1%
411
 
< 0.1%
525
 
0.1%
68
 
< 0.1%
712
 
< 0.1%
89
 
< 0.1%
96
 
< 0.1%
ValueCountFrequency (%)
16842591
< 0.1%
12270821
< 0.1%
12154711
< 0.1%
10245161
< 0.1%
5804641
< 0.1%
4155521
< 0.1%
4010031
< 0.1%
3881261
< 0.1%
3852281
< 0.1%
3849861
< 0.1%

PAY_AMT3
Real number (ℝ≥0)

ZEROS

Distinct7518
Distinct (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5225.6815
Minimum0
Maximum896040
Zeros5968
Zeros (%)19.9%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1390
median1800
Q34505
95-th percentile17589.4
Maximum896040
Range896040
Interquartile range (IQR)4115

Descriptive statistics

Standard deviation17606.96147
Coefficient of variation (CV)3.36931393
Kurtosis564.3112295
Mean5225.6815
Median Absolute Deviation (MAD)1795
Skewness17.21663544
Sum156770445
Variance310005092.2
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05968
 
19.9%
20001285
 
4.3%
10001103
 
3.7%
3000870
 
2.9%
5000721
 
2.4%
1500490
 
1.6%
4000381
 
1.3%
10000312
 
1.0%
1200243
 
0.8%
6000241
 
0.8%
Other values (7508)18386
61.3%
ValueCountFrequency (%)
05968
19.9%
113
 
< 0.1%
219
 
0.1%
314
 
< 0.1%
415
 
0.1%
518
 
0.1%
614
 
< 0.1%
718
 
0.1%
810
 
< 0.1%
912
 
< 0.1%
ValueCountFrequency (%)
8960401
< 0.1%
8890431
< 0.1%
5082291
< 0.1%
4175881
< 0.1%
4009721
< 0.1%
3970921
< 0.1%
3804781
< 0.1%
3717181
< 0.1%
3493951
< 0.1%
3442611
< 0.1%

PAY_AMT4
Real number (ℝ≥0)

ZEROS

Distinct6937
Distinct (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4826.076867
Minimum0
Maximum621000
Zeros6408
Zeros (%)21.4%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1296
median1500
Q34013.25
95-th percentile16014.95
Maximum621000
Range621000
Interquartile range (IQR)3717.25

Descriptive statistics

Standard deviation15666.15974
Coefficient of variation (CV)3.246147995
Kurtosis277.3337677
Mean4826.076867
Median Absolute Deviation (MAD)1500
Skewness12.90498482
Sum144782306
Variance245428561.1
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06408
 
21.4%
10001394
 
4.6%
20001214
 
4.0%
3000887
 
3.0%
5000810
 
2.7%
1500441
 
1.5%
4000402
 
1.3%
10000341
 
1.1%
2500259
 
0.9%
500258
 
0.9%
Other values (6927)17586
58.6%
ValueCountFrequency (%)
06408
21.4%
122
 
0.1%
222
 
0.1%
313
 
< 0.1%
420
 
0.1%
512
 
< 0.1%
616
 
0.1%
711
 
< 0.1%
87
 
< 0.1%
99
 
< 0.1%
ValueCountFrequency (%)
6210001
< 0.1%
5288971
< 0.1%
4970001
< 0.1%
4321301
< 0.1%
4000461
< 0.1%
3317881
< 0.1%
3309821
< 0.1%
3200081
< 0.1%
3130941
< 0.1%
2929621
< 0.1%

PAY_AMT5
Real number (ℝ≥0)

ZEROS

Distinct6897
Distinct (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4799.387633
Minimum0
Maximum426529
Zeros6703
Zeros (%)22.3%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1252.5
median1500
Q34031.5
95-th percentile16000
Maximum426529
Range426529
Interquartile range (IQR)3779

Descriptive statistics

Standard deviation15278.30568
Coefficient of variation (CV)3.183386475
Kurtosis180.0639402
Mean4799.387633
Median Absolute Deviation (MAD)1500
Skewness11.12741705
Sum143981629
Variance233426624.4
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06703
 
22.3%
10001340
 
4.5%
20001323
 
4.4%
3000947
 
3.2%
5000814
 
2.7%
1500426
 
1.4%
4000401
 
1.3%
10000343
 
1.1%
500250
 
0.8%
6000247
 
0.8%
Other values (6887)17206
57.4%
ValueCountFrequency (%)
06703
22.3%
121
 
0.1%
213
 
< 0.1%
313
 
< 0.1%
412
 
< 0.1%
59
 
< 0.1%
67
 
< 0.1%
79
 
< 0.1%
86
 
< 0.1%
96
 
< 0.1%
ValueCountFrequency (%)
4265291
< 0.1%
4179901
< 0.1%
3880711
< 0.1%
3792671
< 0.1%
3320001
< 0.1%
3317881
< 0.1%
3309821
< 0.1%
3268891
< 0.1%
3170771
< 0.1%
3101351
< 0.1%

PAY_AMT6
Real number (ℝ≥0)

ZEROS

Distinct6939
Distinct (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5215.502567
Minimum0
Maximum528666
Zeros7173
Zeros (%)23.9%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1117.75
median1500
Q34000
95-th percentile17343.8
Maximum528666
Range528666
Interquartile range (IQR)3882.25

Descriptive statistics

Standard deviation17777.46578
Coefficient of variation (CV)3.408581541
Kurtosis167.1614296
Mean5215.502567
Median Absolute Deviation (MAD)1500
Skewness10.64072733
Sum156465077
Variance316038289.4
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
07173
23.9%
10001299
 
4.3%
20001295
 
4.3%
3000914
 
3.0%
5000808
 
2.7%
1500439
 
1.5%
4000411
 
1.4%
10000356
 
1.2%
500247
 
0.8%
6000220
 
0.7%
Other values (6929)16838
56.1%
ValueCountFrequency (%)
07173
23.9%
120
 
0.1%
29
 
< 0.1%
314
 
< 0.1%
412
 
< 0.1%
57
 
< 0.1%
66
 
< 0.1%
75
 
< 0.1%
86
 
< 0.1%
97
 
< 0.1%
ValueCountFrequency (%)
5286661
< 0.1%
5271431
< 0.1%
4430011
< 0.1%
4220001
< 0.1%
4035001
< 0.1%
3770001
< 0.1%
3724951
< 0.1%
3512821
< 0.1%
3452931
< 0.1%
3080001
< 0.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
0
23364 
1
6636 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters30000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
023364
77.9%
16636
 
22.1%
Histogram of lengths of the category
ValueCountFrequency (%)
023364
77.9%
16636
 
22.1%

Most occurring characters

ValueCountFrequency (%)
023364
77.9%
16636
 
22.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number30000
100.0%

Most frequent character per category

ValueCountFrequency (%)
023364
77.9%
16636
 
22.1%

Most occurring scripts

ValueCountFrequency (%)
Common30000
100.0%

Most frequent character per script

ValueCountFrequency (%)
023364
77.9%
16636
 
22.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII30000
100.0%

Most frequent character per block

ValueCountFrequency (%)
023364
77.9%
16636
 
22.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default payment next month
01200002212422-1-1-2-239133102689000068900001
1212000022226-1200022682172526823272345532610100010001000020001
2390000222340000002923914027135591433114948155491518150010001000100050000
3450000221370000004699048233492912831428959295472000201912001100106910000
455000012157-10-100086175670358352094019146191312000366811000090006896790
56500001123700000064400570695760819394196192002425001815657100010008000
67500000112290000003679654120234450075426534830034739445500040000380002023913750137700
78100000222230-1-100-111876380601221-1595673806010581168715420
891400002312800200011285140961210812211117933719332904321000100010000
9102000013235-2-2-2-2-1-10000130071391200013007112200

Last rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default payment next month
29990299911400001214100000013832513714213911013826249675461216000700042281505200020000
2999129992210000121343222222500250025002500250025000000001
29992299931000013143000-2-2-288021040000002000000000
2999329994100000112380-1-100030421427102996706266947355004200011178440003000200020000
299942999580000122342222227255777708793847751982607811587000350007000040001
29995299962200001313900000018894819281520836588004312371598085002000050033047500010000
299962999715000013243-1-1-1-100168318283502897951900183735268998129000
29997299983000012237432-10035653356275820878205821935700220004200200031001
299982999980000131411-1000-1-16457837976304527741185548944859003409117819265296418041
299993000050000121460000004792948905497643653532428153132078180014301000100010001